NoSQL operator: updtable

Inserts/updates/deletes table rows based on the contents
of an edit table.

Usage: updtable [options] table_1 < table_2

Options:
    --no-header (-N)
      Strip the table header from output.

    --help (-h)
      Display this help text.

    --key-columns (-K) 'column'
      Use 'column' as the key column from both tables,
      where 'column' can be a list of comma-separated
      column names, that is: 'column1,column2,...'

    --write-size (-w) 'file'
      The number of records found in the update table is
      written to 'file' upon exit. This can be used by
      an invoking program as an indication that it is
      time to permanently merge the updates into the
      main table (table reorganisation).

    --no-insert (-n)
      Ignore records from 'table_1' that do not match
      any records from 'table_2' on the key(s) column.

    --last (-l)
      If the input tables contain duplicated column names
      pick the last occurrence of each. The default is to
      pick the first one. This is sometimes useful after
      the 'jointable' operator.

    --stdin (-s)
      In the normal case, 'table_1' is the table with the
      updates, while 'table_2' is the one being updated.
      This option exchanges the two tables, thus making
      the program expect to read the updates from stdin
      and the main table from a file.

    --fallback (-b)
      By default, an empty non-key field in 'table_1' will
      result in the coresponding field to be cleared also
      on output. If this option is specified, then empty
      input fields will be assigned the previous non-null
      value they had in a preceeding instance, either from
      the update table 'table_1' or from the original table
      'table_2' as a last resort. To explicitly set such
      fields to null on output even in fallback mode, the
      special string "..NIL.." must be used as their value
      in 'table_1'.

Notes:

This operator updates/deletes rows in 'table_2', based on the rows
in 'table_1' that match on the key column(s). Neither table needs to
be sorted nor be unique on any particular fields. If more than one
record with the same key value is present in 'table_1', the last
occurrence takes over. Fields of 'table_1' that are not present in
'table_2' are silently ignored. The key column(s) content is always
taken from 'table_2' regardless of the value that the same field(s) has
in 'table_1'. This is because, by definition, an update operation may
not change the value of existing key(s). The fact that 'updtable' can
also be used to insert new keys and delete old ones (see below) are
handy exceptions added by NoSQL to this rule.

If no key column is specified on the command-line (option '-K') then
'updtable' will try and infer the key column(s) from the one of the two
tables which is not on STDIN. The file-name is expected to be structured
like this:

              somename._k.keycol1[.keycol2 ...][-suffix]

where 'keycol1' is one key column, 'keycol2' and possibly others are
the other key columns, where applicable, and "-suffix" in an optional
trailer that is always removed from the file name for the computation.
If the key column(s) cannot be derived from the file-name, the key
column will default to the leftmost field in both tables, and the two
fields are required to have the same name.

One way or another, if multiple key columns are used, they must be
listed in the same order in which they occur in the input data. That is,
key columns must occur in the same order in both tables, although not
necessarily at the very same column positions.

An existing row in 'table_2' will be deleted if there is a key column(s)
match and any of the non-key columns are equal to the delete string,
"..DEL.." (without the quotes).

Records in 'table_1' that do not have corresponding key values in
'table_2', and that do not contain the delete string in any non-key
positions, are appended to the output stream. Again, fields in 'table_1'
that do not have corresponding ones in 'table_2' are ignored (unless
they contain the delete string, in which case it will be honoured as
usual). Note that in this case the output stream may no longer be sorted
on key value(s), therefore an additional pass through 'sorttable' may be
needed afterwards. Such further sorting MUST NOT be done if the output
table is to be used as the edit buffer associated with a third (possibly
indexed) table, because records in an edit buffer must be sorted in
chronological (i.e. FIFO) order instead of the usual key order.

The header for the resulting table is taken from 'table_2'. The output
sorting will be the same of the key column(s) value of 'table_2'.

One unintended but rather handy property of 'updtable' is that it does
not complain if the edit table contains short/long records, i.e. records
that have a different number of fields than the header. This turns
out to be useful in that it allows the edit table records to contain
only those fields that need to be updated, without bothering about
the others, and without the need to pad short records with TABs. Such
property of 'updtable', although incidental, should be retained also in
future releases of the program, due to its usefulness.

Hint: if all you need is to delete records that match a list of primary
keys, then a better/faster alternative would be to use the 'filtertable'
operator, like this:

      filtertable -- grep -v -f remove_list < input_table

where 'remove_list' is a file containing one key per line, each
prepended by a caret (^) and followed by a TAB, to make sure that it
matches the table leftmost field, that is the primary key.
Back